Prefix-Shuffled Geometric Suffix Tree
نویسنده
چکیده
Protein structure analysis is one of the most important research issues in the post-genomic era, and faster and more accurate index data structures for such 3-D structures are highly desired for research on proteins. The geometric suffix tree is a very sophisticated index structure that enables fast and accurate search on protein 3-D structures. By using it, we can search from 3-D structure databases for all the substructures whose RMSDs (root mean square deviations) to a given query 3-D structure are not larger than a given bound. In this paper, we propose a new data structure based on the geometric suffix tree whose query performance is much better than the original geometric suffix tree. We call the modified data structure the prefix-shuffled geometric suffix tree (or PSGST for short). According to our experiments, the PSGST outperforms the geometric suffix tree in most cases. The PSGST shows its best performance when the database does not have many substructures similar to the query. The query is sometimes 100 times faster than the original geometric suffix trees in such cases.
منابع مشابه
Sparse and Truncated Suffix Trees on Variable-Length Codes
The sparse suffix trees (SST), introduced by (Kärkkäinen and Ukkonen, COCOON 1996), is the suffix tree for a subset of all suffixes of an input text T of length n. In this paper, we study a special case that an input string is a sequence of codewords drawn from a regular prefix code ∆ ⊆ Σ recognized by a finite automaton, and index points locate on the code boundaries. In this case, we present ...
متن کاملLinear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications
We present a linear-time algorithm to compute the longest common prefix information in suffix arrays. As two applications of our algorithm, we show that our algorithm is crucial to the effective use of block-sorting compression, and we present a linear-time algorithm to simulate the bottom-up traversal of a suffix tree with a suffix array combined with the longest common prefix information.
متن کاملSparse Suffix Tree Construction with Small Space
We consider the problem of constructing a sparse suffix tree (or suffix array) for b suffixes of a given text T of size n, using only O(b) words of space during construction time. Breaking the naive bound of Ω(nb) time for this problem has occupied many algorithmic researchers since a different structure, the (evenly spaced) sparse suffix tree, was introduced by Kärkkäinen and Ukkonen in 1996. ...
متن کاملSampled Longest Common Prefix Array
When augmented with the longest common prefix (LCP) array and some other structures, the suffix array can solve many string processing problems in optimal time and space. A compressed representation of the LCP array is also one of the main building blocks in many compressed suffix tree proposals. In this paper, we describe a new compressed LCP representation: the sampled LCP array. We show that...
متن کاملSuffix Arrays for Structural Strings
The structural match (s-match), originally addressed by the structural suffix tree, helps identify different RNA sequences with the same secondary structure. In this work, we introduce and construct the structural suffix array and structural longest common prefix array, i.e. lightweight suffix data structures for the s-match. Further, we illustrate how to use our data structures to support addi...
متن کامل